Overview

Dataset statistics

Number of variables15
Number of observations2500
Missing cells95
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory293.1 KiB
Average record size in memory120.1 B

Variable types

CAT8
NUM7

Warnings

Interest.Rate has a high cardinality: 275 distinct values High cardinality
Debt.To.Income.Ratio has a high cardinality: 1669 distinct values High cardinality
Amount.Funded.By.Investors is highly correlated with Amount.RequestedHigh correlation
Amount.Requested is highly correlated with Amount.Funded.By.InvestorsHigh correlation
Employment.Length has 77 (3.1%) missing values Missing
Debt.To.Income.Ratio is uniformly distributed Uniform
LoanID has unique values Unique
Revolving.CREDIT.Balance has 39 (1.6%) zeros Zeros
Inquiries.in.the.Last.6.Months has 1249 (50.0%) zeros Zeros

Reproduction

Analysis started2020-12-03 09:28:43.264286
Analysis finished2020-12-03 09:29:08.497785
Duration25.23 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

LoanID
Real number (ℝ≥0)

UNIQUE

Distinct2500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1250.5
Minimum1
Maximum2500
Zeros0
Zeros (%)0.0%
Memory size19.5 KiB

Quantile statistics

Minimum1
5-th percentile125.95
Q1625.75
median1250.5
Q31875.25
95-th percentile2375.05
Maximum2500
Range2499
Interquartile range (IQR)1249.5

Descriptive statistics

Standard deviation721.8321596
Coefficient of variation (CV)0.5772348338
Kurtosis-1.2
Mean1250.5
Median Absolute Deviation (MAD)625
Skewness0
Sum3126250
Variance521041.6667
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20471< 0.1%
 
12081< 0.1%
 
12221< 0.1%
 
12201< 0.1%
 
12181< 0.1%
 
12161< 0.1%
 
12141< 0.1%
 
12121< 0.1%
 
12101< 0.1%
 
12061< 0.1%
 
Other values (2490)249099.6%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
25001< 0.1%
 
24991< 0.1%
 
24981< 0.1%
 
24971< 0.1%
 
24961< 0.1%
 

Amount.Requested
Real number (ℝ≥0)

HIGH CORRELATION

Distinct380
Distinct (%)15.2%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean12405.46218
Minimum1000
Maximum35000
Zeros0
Zeros (%)0.0%
Memory size19.5 KiB

Quantile statistics

Minimum1000
5-th percentile2867.5
Q16000
median10000
Q317000
95-th percentile28000
Maximum35000
Range34000
Interquartile range (IQR)11000

Descriptive statistics

Standard deviation7802.933666
Coefficient of variation (CV)0.6289917739
Kurtosis0.3073694982
Mean12405.46218
Median Absolute Deviation (MAD)5000
Skewness0.9133255208
Sum31001250
Variance60885773.8
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100002068.2%
 
120001516.0%
 
50001104.4%
 
200001074.3%
 
60001034.1%
 
15000973.9%
 
8000903.6%
 
25000652.6%
 
7000542.2%
 
16000532.1%
 
Other values (370)146358.5%
 
ValueCountFrequency (%) 
1000130.5%
 
11251< 0.1%
 
120060.2%
 
140030.1%
 
14501< 0.1%
 
ValueCountFrequency (%) 
35000512.0%
 
345001< 0.1%
 
3360030.1%
 
335001< 0.1%
 
330001< 0.1%
 

Amount.Funded.By.Investors
Real number (ℝ)

HIGH CORRELATION

Distinct710
Distinct (%)28.4%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean12002.37419
Minimum-0.01
Maximum35000
Zeros4
Zeros (%)0.2%
Memory size19.5 KiB

Quantile statistics

Minimum-0.01
5-th percentile2200
Q16000
median10000
Q316000
95-th percentile27925
Maximum35000
Range35000.01
Interquartile range (IQR)10000

Descriptive statistics

Standard deviation7746.767348
Coefficient of variation (CV)0.6454362469
Kurtosis0.4144091354
Mean12002.37419
Median Absolute Deviation (MAD)5000
Skewness0.9320839115
Sum29993933.09
Variance60012404.34
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
100001636.5%
 
120001084.3%
 
5000873.5%
 
6000853.4%
 
8000692.8%
 
15000682.7%
 
20000592.4%
 
7000401.6%
 
4000351.4%
 
16000351.4%
 
Other values (700)175070.0%
 
ValueCountFrequency (%) 
-0.0120.1%
 
040.2%
 
2001< 0.1%
 
214.021< 0.1%
 
224.991< 0.1%
 
ValueCountFrequency (%) 
35000311.2%
 
34977.351< 0.1%
 
3497550.2%
 
3495060.2%
 
349001< 0.1%
 

Interest.Rate
Categorical

HIGH CARDINALITY

Distinct275
Distinct (%)11.0%
Missing2
Missing (%)0.1%
Memory size19.5 KiB
12.12%
 
122
7.90%
 
119
13.11%
 
115
15.31%
 
76
14.09%
 
72
Other values (270)
1994 
ValueCountFrequency (%) 
12.12%1224.9%
 
7.90%1194.8%
 
13.11%1154.6%
 
15.31%763.0%
 
14.09%722.9%
 
14.33%692.8%
 
8.90%642.6%
 
11.14%582.3%
 
6.03%572.3%
 
17.27%562.2%
 
Other values (265)169067.6%
 
Frequencies of value counts

Unique

Unique69 ?
Unique (%)2.8%
Histogram of lengths of the category

Length

Max length6
Median length6
Mean length5.7536
Min length3

Loan.Length
Categorical

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size19.5 KiB
36 months
1952 
60 months
548 
ValueCountFrequency (%) 
36 months195278.1%
 
60 months54821.9%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length9
Mean length9
Min length9

Loan.Purpose
Categorical

Distinct14
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size19.5 KiB
debt_consolidation
1307 
credit_card
444 
other
201 
home_improvement
152 
major_purchase
 
101
Other values (9)
295 
ValueCountFrequency (%) 
debt_consolidation130752.3%
 
credit_card44417.8%
 
other2018.0%
 
home_improvement1526.1%
 
major_purchase1014.0%
 
small_business873.5%
 
car502.0%
 
wedding391.6%
 
medical301.2%
 
moving291.2%
 
Other values (4)602.4%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length18
Median length18
Mean length14.3132
Min length3

Debt.To.Income.Ratio
Categorical

HIGH CARDINALITY
UNIFORM

Distinct1669
Distinct (%)66.8%
Missing1
Missing (%)< 0.1%
Memory size19.5 KiB
0%
 
8
12.54%
 
6
17.95%
 
5
12.20%
 
5
15.60%
 
5
Other values (1664)
2470 
ValueCountFrequency (%) 
0%80.3%
 
12.54%60.2%
 
17.95%50.2%
 
12.20%50.2%
 
15.60%50.2%
 
22.74%50.2%
 
17%50.2%
 
15.88%50.2%
 
12.85%50.2%
 
16.73%50.2%
 
Other values (1659)244597.8%
 
Frequencies of value counts

Unique

Unique1088 ?
Unique (%)43.5%
Histogram of lengths of the category

Length

Max length6
Median length6
Mean length5.6884
Min length2

State
Categorical

Distinct46
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size19.5 KiB
CA
433 
NY
255 
TX
174 
FL
169 
IL
 
101
Other values (41)
1368 
ValueCountFrequency (%) 
CA43317.3%
 
NY25510.2%
 
TX1747.0%
 
FL1696.8%
 
IL1014.0%
 
GA983.9%
 
PA963.8%
 
NJ943.8%
 
VA783.1%
 
MA732.9%
 
Other values (36)92937.2%
 
Frequencies of value counts

Unique

Unique2 ?
Unique (%)0.1%
Histogram of lengths of the category

Length

Max length2
Median length2
Mean length2
Min length2

Home.Ownership
Categorical

Distinct5
Distinct (%)0.2%
Missing1
Missing (%)< 0.1%
Memory size19.5 KiB
MORTGAGE
1147 
RENT
1146 
OWN
200 
OTHER
 
5
NONE
 
1
ValueCountFrequency (%) 
MORTGAGE114745.9%
 
RENT114645.8%
 
OWN2008.0%
 
OTHER50.2%
 
NONE1< 0.1%
 
(Missing)1< 0.1%
 
Frequencies of value counts

Unique

Unique1 ?
Unique (%)< 0.1%
Histogram of lengths of the category

Length

Max length8
Median length4
Mean length5.7568
Min length3

Monthly.Income
Real number (ℝ≥0)

Distinct632
Distinct (%)25.3%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5688.931321
Minimum588.5
Maximum102750
Zeros0
Zeros (%)0.0%
Memory size19.5 KiB

Quantile statistics

Minimum588.5
5-th percentile2166.003
Q13500
median5000
Q36800
95-th percentile11666.703
Maximum102750
Range102161.5
Interquartile range (IQR)3300

Descriptive statistics

Standard deviation3963.118185
Coefficient of variation (CV)0.6966366725
Kurtosis167.4344468
Mean5688.931321
Median Absolute Deviation (MAD)1666.67
Skewness8.467690017
Sum14216639.37
Variance15706305.75
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
50001074.3%
 
4166.67843.4%
 
3333.33712.8%
 
5416.67702.8%
 
5833.33582.3%
 
3750532.1%
 
6666.67522.1%
 
2500512.0%
 
4583.33502.0%
 
6250461.8%
 
Other values (622)185774.3%
 
ValueCountFrequency (%) 
588.51< 0.1%
 
666.671< 0.1%
 
833.331< 0.1%
 
866.671< 0.1%
 
884.91< 0.1%
 
ValueCountFrequency (%) 
1027501< 0.1%
 
650001< 0.1%
 
39583.331< 0.1%
 
27083.331< 0.1%
 
2500040.2%
 

FICO.Range
Categorical

Distinct38
Distinct (%)1.5%
Missing2
Missing (%)0.1%
Memory size19.5 KiB
670-674
171 
675-679
 
166
680-684
 
157
695-699
 
153
665-669
 
145
Other values (33)
1706 
ValueCountFrequency (%) 
670-6741716.8%
 
675-6791666.6%
 
680-6841576.3%
 
695-6991536.1%
 
665-6691455.8%
 
690-6941405.6%
 
685-6891365.4%
 
705-7091345.4%
 
700-7041315.2%
 
660-6641255.0%
 
Other values (28)104041.6%
 
Frequencies of value counts

Unique

Unique3 ?
Unique (%)0.1%
Histogram of lengths of the category

Length

Max length7
Median length7
Mean length6.9968
Min length3

Open.CREDIT.Lines
Real number (ℝ≥0)

Distinct29
Distinct (%)1.2%
Missing3
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean10.07288746
Minimum2
Maximum38
Zeros0
Zeros (%)0.0%
Memory size19.5 KiB

Quantile statistics

Minimum2
5-th percentile4
Q17
median9
Q313
95-th percentile18
Maximum38
Range36
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.507416186
Coefficient of variation (CV)0.44748005
Kurtosis1.463708626
Mean10.07288746
Median Absolute Deviation (MAD)3
Skewness0.8866909421
Sum25152
Variance20.31680067
MonotocityNot monotonic
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%) 
826210.5%
 
92379.5%
 
62329.3%
 
72168.6%
 
111877.5%
 
101857.4%
 
131586.3%
 
121536.1%
 
51536.1%
 
141385.5%
 
Other values (19)57623.0%
 
ValueCountFrequency (%) 
2241.0%
 
3602.4%
 
41064.2%
 
51536.1%
 
62329.3%
 
ValueCountFrequency (%) 
381< 0.1%
 
361< 0.1%
 
341< 0.1%
 
311< 0.1%
 
2630.1%
 

Revolving.CREDIT.Balance
Real number (ℝ≥0)

ZEROS

Distinct2349
Distinct (%)94.1%
Missing3
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean15223.18462
Minimum0
Maximum270800
Zeros39
Zeros (%)1.6%
Memory size19.5 KiB

Quantile statistics

Minimum0
5-th percentile916.2
Q15584
median10948
Q318861
95-th percentile40768.4
Maximum270800
Range270800
Interquartile range (IQR)13277

Descriptive statistics

Standard deviation18281.01526
Coefficient of variation (CV)1.200866685
Kurtosis49.15169313
Mean15223.18462
Median Absolute Deviation (MAD)6191
Skewness5.401569499
Sum38012292
Variance334195518.9
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0391.6%
 
217430.1%
 
1258830.1%
 
605530.1%
 
1505530.1%
 
716120.1%
 
1426820.1%
 
696920.1%
 
2069420.1%
 
144220.1%
 
Other values (2339)243697.4%
 
(Missing)30.1%
 
ValueCountFrequency (%) 
0391.6%
 
11< 0.1%
 
71< 0.1%
 
91< 0.1%
 
161< 0.1%
 
ValueCountFrequency (%) 
2708001< 0.1%
 
2458861< 0.1%
 
2178271< 0.1%
 
2165611< 0.1%
 
1942051< 0.1%
 

Inquiries.in.the.Last.6.Months
Real number (ℝ≥0)

ZEROS

Distinct10
Distinct (%)0.4%
Missing3
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean0.9066880256
Minimum0
Maximum9
Zeros1249
Zeros (%)50.0%
Memory size19.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum9
Range9
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.231149256
Coefficient of variation (CV)1.357853221
Kurtosis6.545444299
Mean0.9066880256
Median Absolute Deviation (MAD)0
Skewness2.042124554
Sum2264
Variance1.51572849
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0124950.0%
 
165726.3%
 
233613.4%
 
31696.8%
 
4502.0%
 
5140.6%
 
680.3%
 
770.3%
 
950.2%
 
820.1%
 
(Missing)30.1%
 
ValueCountFrequency (%) 
0124950.0%
 
165726.3%
 
233613.4%
 
31696.8%
 
4502.0%
 
ValueCountFrequency (%) 
950.2%
 
820.1%
 
770.3%
 
680.3%
 
5140.6%
 

Employment.Length
Categorical

MISSING

Distinct11
Distinct (%)0.5%
Missing77
Missing (%)3.1%
Memory size19.5 KiB
10+ years
653 
< 1 year
250 
2 years
244 
3 years
235 
5 years
202 
Other values (6)
839 
ValueCountFrequency (%) 
10+ years65326.1%
 
< 1 year25010.0%
 
2 years2449.8%
 
3 years2359.4%
 
5 years2028.1%
 
4 years1927.7%
 
1 year1777.1%
 
6 years1636.5%
 
7 years1275.1%
 
8 years1084.3%
 
(Missing)773.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length7.4284
Min length3

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

LoanIDAmount.RequestedAmount.Funded.By.InvestorsInterest.RateLoan.LengthLoan.PurposeDebt.To.Income.RatioStateHome.OwnershipMonthly.IncomeFICO.RangeOpen.CREDIT.LinesRevolving.CREDIT.BalanceInquiries.in.the.Last.6.MonthsEmployment.Length
0120000.020000.08.90%36 monthsdebt_consolidation14.90%SCMORTGAGE6541.67735-73914.014272.02.0< 1 year
1219200.019200.012.12%36 monthsdebt_consolidation28.36%TXMORTGAGE4583.33715-71912.011140.01.02 years
2335000.035000.021.98%60 monthsdebt_consolidation23.81%CAMORTGAGE11500.00690-69414.021977.01.02 years
3410000.09975.09.99%36 monthsdebt_consolidation14.30%KSMORTGAGE3833.33695-69910.09346.00.05 years
4512000.012000.011.71%36 monthscredit_card18.78%NJRENT3195.00695-69911.014469.00.09 years
566000.06000.015.31%36 monthsother20.05%CTOWN4891.67670-67417.010391.02.03 years
6710000.010000.07.90%36 monthsdebt_consolidation26.09%MARENT2916.67720-72410.015957.00.010+ years
7833500.033450.017.14%60 monthscredit_card14.70%LAMORTGAGE13863.42705-70912.027874.00.010+ years
8914675.014675.014.33%36 monthscredit_card26.92%CARENT3150.00685-6899.07246.01.08 years
9107000.07000.06.91%36 monthscredit_card7.10%CARENT5000.00715-7198.07612.00.03 years

Last rows

LoanIDAmount.RequestedAmount.Funded.By.InvestorsInterest.RateLoan.LengthLoan.PurposeDebt.To.Income.RatioStateHome.OwnershipMonthly.IncomeFICO.RangeOpen.CREDIT.LinesRevolving.CREDIT.BalanceInquiries.in.the.Last.6.MonthsEmployment.Length
2490249110000.0NaN11.71%36 monthsdebt_consolidation8.40%CARENT4500.00710-7148.08404.01.03 years
249124928475.08475.007.62%36 monthsdebt_consolidation15.88%CARENT3983.33720-7249.06882.0NaNNaN
249224936400.06350.0010.08%36 monthsdebt_consolidationNaNNJNaN5166.67710-7145.05815.02.010+ years
2493249430000.030000.0023.28%60 monthsother12.10%ILMORTGAGE7083.33675-67916.017969.01.010+ years
2494249524000.023975.0014.65%36 monthsdebt_consolidation15.29%WAMORTGAGE6666.67NaN13.017521.00.05 years
2495249630000.029950.0016.77%60 monthsdebt_consolidation19.23%NYMORTGAGE9250.00705-70915.045880.01.08 years
2496249716000.016000.0014.09%60 monthshome_improvement21.54%MDOWN8903.25740-74418.018898.01.010+ years
2497249810000.010000.0013.99%36 monthsdebt_consolidation4.89%PAMORTGAGE2166.67680-6844.04544.00.010+ years
249824996000.06000.0012.42%36 monthsmajor_purchase16.66%NJRENT3500.00675-6798.07753.00.05 years
249925009000.05242.7513.79%36 monthsdebt_consolidation6.76%NYRENT3875.00670-6747.07589.00.010+ years